In this file, we will perform compositional analysis of our scaled/noramlized/rarefied microbial dataset!
scaled_physeq.RData, which includes a
rooted tree that we created in
analysis/06_Ordination/scaled_physeq.RData.Microbial abundance data—like 16S rRNA gene or metagenomics data—are typically compositional: they represent relative abundances constrained to a constant total (e.g., percent or proportions). This introduces spurious correlations and other issues if analyzed with traditional statistics. This is a very important limitation to microbial data!
INTERPRETATION #1: What are the limitations of interpreting microbial abundance from relative (aka. compositional) rather than absolute counts?
The following phyloseq object contains microbial community composition data in a standardized format. In this case, we’ve already normalized the reads (scaled to 1,942 per sample), which is essential for comparing relative abundances.
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 2535 taxa and 88 samples ]
## sample_data() Sample Data: [ 88 samples by 43 sample variables ]
## tax_table() Taxonomy Table: [ 2535 taxa by 9 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 2535 tips and 2534 internal nodes ]
## [1] 1942
## [1] 1942 1990
In this analysis, we will drill down from phylum to ASV level analyses, which will enable increasingly detailed insights into microbial diversity and potential ecological roles. However, it is also important for us to remember that deeper levels also come with increased noise and data sparsity, especially for rare groups.
Taxonomic analysis often begins at broader levels (e.g., Phylum) to visualize overarching trends before zooming in on finer-scale patterns. This step allows us to identify which microbial groups dominate across samples and which may respond to environmental gradients like salinity.
Now, let’s calculate the relative abundance of the phyla across all the samples. NOTE: To do this, it is imperative that we have scaled the data to a constant sequencing depth.
# Create a phylum level dataframe
phylum_df <-
scaled_physeq %>%
# Agglomerate all ASV counts within a phylum
tax_glom(taxrank = "Phylum") %>%
# Calculate the relative abundance!
transform_sample_counts(function(x) {x/sum(x)}) %>%
# Create a dataframe from phyloseq object
psmelt() %>%
# Filter out Phyla < 1 %
#dplyr::filter(Abundance > 0.01) %>%
# fix the order of date
mutate(date = fct_relevel(date, c("6/2/21", "6/15/21", "10/5/21")),
# Fix the station order
station = fct_relevel(station, c("Copano West", "Copano East",
"Mesquite Bay", "Aransas Bay",
"Shipping Channel")))
## What are the phylum abundances?
phylum_df %>%
group_by(Phylum) %>%
summarize(mean_PercAbund = round(mean(Abundance), digits = 4)) %>%
arrange(-mean_PercAbund) %>%
datatable()# Make a list of phyla the top phyla
top10_phyla <-
phylum_df %>%
group_by(Phylum) %>%
summarize(mean_PercAbund = mean(Abundance)) %>%
arrange(-mean_PercAbund) %>%
head(n = 10) %>%
pull(Phylum)INTERPRETATION #2: Which phyla in your dataset have the highest relative abundances? What are their abundances? We will focus on these phyla for plotting below. :)
Visualization helps detect patterns in composition and abundance that statistical models may later test. Stacked bar plots are often used but can obscure individual sample variation and visualize too much data all at once.
Therefore, we will also explore faceted and jittered boxplots below the bar plots to see sample-level trends more clearly.
# Stacked Bar Plot With All phyla
# Plot Phylum Abundances - make sure to load phylum_colors
phylum_df %>%
dplyr::filter(Phylum %in% top10_phyla) %>%
# Warning: It's important to have one sample per x value,
# otherwise, it will take the sum between multiple samples
dplyr::filter(depth == 0.0) %>%
dplyr::filter(fraction == "Whole") %>%
ggplot(aes(x = date, y = Abundance, fill = Phylum)) +
facet_grid(.~station) +
geom_bar(stat = "identity", color = "black") +
labs(title = "Top 10 Phyla: Surface Samples") +
scale_fill_manual(values = phylum_colors) +
theme_bw() +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))To help compare the phylum abundance between sample types, we can facet by phylum to better see how the changes occur across the stations, which is masked in the stacked bar plot. It’s a little better than the stacked bar plot, however, we can do even better!
phylum_df %>%
dplyr::filter(Phylum %in% top10_phyla) %>%
# Individual sample comparison of surface waters, whole fraction
# This will allow us to look at individual samples!
# Note that whenever plotting stacked bar plots, you should always look at Individual samples!
dplyr::filter(depth == 0.0) %>%
dplyr::filter(fraction == "Whole") %>%
ggplot(aes(x = station, y = Abundance, fill = Phylum)) +
facet_grid(Phylum~date, scale = "free") +
# add the stacked bar
geom_bar(stat = "identity", color = "black") +
# change the colors to be our selected colors
scale_fill_manual(values = phylum_colors) +
theme_bw() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))### Or combined together:
phylum_df %>%
dplyr::filter(Phylum %in% top10_phyla) %>%
ggplot(aes(x = station, y = Abundance, fill = Phylum, color = Phylum)) +
facet_grid(Phylum~date, scale = "free") +
# add the stacked bar
geom_jitter() +
geom_boxplot(outlier.shape = NA, alpha = 0.5) +
# change the colors to be our selected colors
scale_fill_manual(values = phylum_colors) +
scale_color_manual(values = phylum_colors) +
theme_bw() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))### Or combined together:
phylum_df %>%
dplyr::filter(Phylum %in% top10_phyla) %>%
ggplot(aes(x = station, y = Abundance, fill = Phylum, color = Phylum)) +
facet_wrap(Phylum~., scales = "free", nrow = 2) +
# add the stacked bar
geom_jitter() +
geom_boxplot(outlier.shape = NA, alpha = 0.5) +
# change the colors to be our selected colors
scale_fill_manual(values = phylum_colors) +
scale_color_manual(values = phylum_colors) +
theme_bw() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))Microbial community composition varied markedly across the estuarine salinity gradient! Specifically, several phyla display clear spatial structuring, which is exciting. This figure highlights strong phylum-level ecological filtering across salinity gradients, with Actinomycetota and Cyanobacteriota showing the most pronounced spatial turnover. Pseudomonadota and Bacteroidota appear more ubiquitous, while other groups reflect niche partitioning or rare biosphere dynamics.
Some specific results:
After initial exploration, we focus on specific phyla that appear to vary across stations or with salinity. These targeted plots help develop hypotheses about ecological drivers.
INTERPRETATION #3: Make the above dotplot and boxplot for your sample types and your most abundant phyla. What initial conclusions can you draw regarding the sampled microbial communities as it relates to your scientific question?
# Narrow in on a specific group
# Actinomycetota - y: abundance, x: station, dot plot + boxplot
actino_phylum_station <-
phylum_df %>%
dplyr::filter(Phylum == "Actinomycetota") %>%
# build the plot
ggplot(aes(x = station, y = Abundance,
fill = station, color = station)) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) + # outliers not plotted here in boxplot
geom_jitter() +
theme_bw() +
labs(title = "Actinomycetota Phylum") +
scale_color_manual(values = station_colors) +
scale_fill_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "right")
# Statistically: Kruskall-Wallis followed by a Tukey's Posthoc test
# These are non-parametric (non-normal) stat tests
# CONTINUOUS
actino_phylum_salinity <-
phylum_df %>%
dplyr::filter(Phylum == "Actinomycetota") %>%
ggplot(aes(x = salinity_psu, y = Abundance)) +
geom_point(aes(color = station)) +
theme_bw() +
geom_smooth(method = "lm", formula = y ~ poly(x, 2)) +
labs(title = "Actinomycetota Phylum") +
scale_color_manual(values = station_colors) +
theme(legend.position = "none")
# Collect both of the plots together into one
actino_phylum_station + actino_phylum_salinity +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")Actinomycetota Abundance Across Stations and Salinity
A. Actinomycetota relative abundance is highest in low-salinity stations (Copano West and East) and steadily declines moving toward more marine stations (Aransas Bay and Shipping Channel). B. Abundance shows a strong negative relationship with salinity, consistent with a freshwater preference. This taxon appears to be sensitive to increasing salinity, with almost no presence in high-salinity samples.
Actinomycetes are Gram-positive bacteria known for decomposing complex organic matter, especially in soils and freshwater systems. Their decline with increasing salinity suggests sensitivity to osmotic stress and highlights their role in freshwater nutrient cycling and carbon turnover.
# Narrow in on a specific group
# Cyanobacteriota - y: abundance, x: station, dot plot + boxplot
# Plot by station
cyano_phylum_station <-
phylum_df %>%
dplyr::filter(Phylum == "Cyanobacteriota") %>%
ggplot(aes(x = station, y = Abundance,
fill = station, color = station)) +
# outliers not plotted here in boxplot
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
geom_jitter() +
theme_bw() +
labs(title = "Cyanobacteriota Phylum") +
scale_color_manual(values = station_colors) +
scale_fill_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "right")
# Statistically: Kruskall-Wallis followed by a Tukey's Posthoc test
# These are non-parametric (non-normal) stat tests
# CONTINUOUS
cyano_phylum_salinity <-
phylum_df %>%
dplyr::filter(Phylum == "Cyanobacteriota") %>%
# build the plot
ggplot(aes(x = salinity_psu, y = Abundance)) +
geom_point(aes(color = station, shape = date)) +
theme_bw() +
geom_smooth(method = "lm", formula = y ~ poly(x, 2)) +
labs(title = "Cyanobacteriota Phylum") +
scale_color_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "none")
# Collect the Cyanobacteriota Plots
cyano_phylum_station + cyano_phylum_salinity +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")Cyanobacteriota Abundance Across Stations and Salinity
A. Cyanobacteriota relative abundance varies across estuarine stations, with generally higher median abundances observed in mid-salinity environments (e.g., Mesquite and Aransas Bays) compared to the more freshwater (Copano) and marine-influenced (Shipping Channel) sites. B. When plotted against salinity, Cyanobacteriota abundance follows a unimodal (hump-shaped) trend, peaking around intermediate salinity values (~15 PSU). This suggests that Cyanobacteriota in this system may be best adapted to brackish conditions.
Cyanobacteria are globally important primary producers, contributing significantly to oxygen production and carbon cycling. In estuaries, they often dominate in brackish waters and include both free-living and bloom-forming taxa. Their response to salinity helps indicate shifts in primary production potential and trophic dynamics.
# Narrow in on a specific group
# Verrucomicrobiota - y: abundance, x: station, dot plot + boxplot
# Plot by station
verruco_phylum_station <-
phylum_df %>%
dplyr::filter(Phylum == "Verrucomicrobiota") %>%
ggplot(aes(x = station, y = Abundance,
fill = station, color = station)) +
# outliers not plotted here in boxplot
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
geom_jitter() +
theme_bw() +
labs(title = "Verrucomicrobiota Phylum") +
scale_color_manual(values = station_colors) +
scale_fill_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "right")
# Statistically: Kruskall-Wallis followed by a Tukey's Posthoc test
# These are non-parametric (non-normal) stat tests
# CONTINUOUS
verruco_phylum_salinity <-
phylum_df %>%
dplyr::filter(Phylum == "Verrucomicrobiota") %>%
ggplot(aes(x = salinity_psu, y = Abundance)) +
geom_point(aes(color = station, shape = date)) +
theme_bw() +
geom_smooth(method = "lm") +
labs(title = "Verrucomicrobiota Phylum") +
scale_color_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "none")
# Collect the Verrucomicrobiota Plots
verruco_phylum_station + verruco_phylum_salinity +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")## `geom_smooth()` using formula = 'y ~ x'
Verrucomicrobiota Abundance Across Stations and Salinity
A. Verrucomicrobiota are most abundant in lower salinity sites (Copano West and East) and show a decreasing trend in abundance toward more saline stations (Shipping Channel). B. There is a clear negative correlation between Verrucomicrobiota abundance and salinity, suggesting this phylum may prefer fresher conditions and is less competitive in higher salinity environments.
This diverse phylum is often associated with polysaccharide degradation and has been found in soils, freshwater, and marine systems. In estuarine environments, their presence in low salinity zones suggests a role in terrestrial or freshwater organic matter processing.
# Narrow in on a specific group
# Bacteroidota - y: abundance, x: station, dot plot + boxplot
# Plot by station
bacteroidota_phylum_station <-
phylum_df %>%
dplyr::filter(Phylum == "Bacteroidota") %>%
ggplot(aes(x = station, y = Abundance,
fill = station, color = station)) +
# outliers not plotted here in boxplot
geom_boxplot(alpha = 0.5, outlier.shape = NA) +
geom_jitter() +
theme_bw() +
labs(title = "Bacteroidota Phylum") +
scale_color_manual(values = station_colors) +
scale_fill_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "right")
# Statistically: Kruskall-Wallis followed by a Tukey's Posthoc test
# These are non-parametric (non-normal) stat tests
# CONTINUOUS
bacteroidota_phylum_salinity <-
phylum_df %>%
dplyr::filter(Phylum == "Bacteroidota") %>%
# build the plot
ggplot(aes(x = salinity_psu, y = Abundance)) +
geom_point(aes(color = station, shape = date)) +
theme_bw() +
geom_smooth(method = "lm") +
labs(title = "Bacteroidota Phylum") +
scale_color_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "none")
# Collect the Bacteroidota Plots
bacteroidota_phylum_station + bacteroidota_phylum_salinity +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")## `geom_smooth()` using formula = 'y ~ x'
Bacteroidota Abundance Across Stations and Salinity
A. Unlike the other phyla, Bacteroidota relative abundance increases along the estuarine salinity gradient, with the highest levels observed in the most marine-influenced site (Shipping Channel). B. Abundance shows a strong positive correlation with salinity, indicating this phylum may be better adapted to marine or higher-salinity conditions, and potentially involved in degradation of organic matter under saline conditions.
Members of Bacteroidota are prominent degraders of high-molecular-weight organic matter such as proteins and polysaccharides. Their increasing abundance with salinity implies they may be key players in coastal and marine carbon cycling, especially in environments rich in organic particles.
| Phylum | Salinity Trend | Peak/Preference | Interpretation |
|---|---|---|---|
| Cyanobacteriota | Unimodal (hump-shaped) | Brackish (~15 PSU) | Thrives in intermediate salinity; may be adapted to estuarine conditions. |
| Actinomycetota | Negative | Freshwater (low salinity) | Strongly freshwater-associated; declines with increasing salinity. |
| Verrucomicrobiota | Negative | Freshwater (low salinity) | More abundant in low-salinity environments; possibly outcompeted in saline sites. |
| Bacteroidota | Positive | Marine (high salinity) | Increases with salinity; likely involved in marine organic matter degradation. |
INTERPRETATION #4: Based on these phylum-level analyses, were there any phyla that you were able to identify to focus on moving forward for deeper taxonomic analyses? Why?
Let’s first calculate the genus data frame.
# Calculate the Family relative abundance
# Note: The read depth MUST be normalized in some way: scale_reads
genus_df <-
scaled_physeq %>%
# agglomerate at the phylum level
tax_glom(taxrank = "Genus") %>%
# Transform counts to relative abundance
transform_sample_counts(function (x) {x/sum(x)}) %>%
# Melt to a long format
psmelt() %>%
# Fix the order of date
mutate(date = fct_relevel(date, c("6/2/21", "6/15/21", "10/5/21")),
station = fct_relevel(station, c("Copano West", "Copano East",
"Mesquite Bay", "Aransas Bay",
"Shipping Channel")))# Actinobacteriota
# Plot genus
actino_genus_station <-
genus_df %>%
dplyr::filter(Phylum == "Actinomycetota") %>%
# At first, plot all of the genera and then subset the ones that have intersting trends
dplyr::filter(Genus %in% c("hgcI clade", "CL500-29 marine group")) %>%
# build the plot
ggplot(aes(x = station, y = Abundance,
fill = station, color = station)) +
facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) + # outliers not plotted here in boxplot
geom_jitter() +
theme_bw() +
labs(title = "Actinomycetota Genera") +
scale_color_manual(values = station_colors) +
scale_fill_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "none")
# Plot genus: Continuous
actino_genus_salinity <-
genus_df %>%
dplyr::filter(Phylum == "Actinomycetota") %>%
dplyr::filter(Genus %in% c("hgcI clade", "CL500-29 marine group")) %>%
# build the plot
ggplot(aes(x = salinity_psu, y = Abundance)) +
facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
geom_point(aes(color = station)) + theme_bw() +
geom_smooth(method = "lm", formula = y ~ poly(x, 2)) +
labs(title = "Actinomycetota Genera") +
scale_color_manual(values = station_colors) +
theme(legend.position = "none")
# Collect the Actinomycetota Plots
actino_genus_station / actino_genus_salinity +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")Genera-Level Response of Actinomycetota to Station and Salinity
This figure shows the relative abundance of two dominant Actinomycetota genera—CL500-29 marine group and hgcI clade—across estuarine stations (A) and a salinity gradient (B).
These patterns reinforce the role of salinity as a key environmental filter in structuring Actinomycetota populations and suggest functional or physiological constraints that limit these genera to distinct salinity niches.
Let’s take a look at two important and very abundant Cyano Genera:
# Plot genus: Categorical
cyano_genus_station <-
genus_df %>%
dplyr::filter(Phylum == "Cyanobacteriota") %>%
dplyr::filter(Genus %in% c("Cyanobium PCC-6307", "Synechococcus CC9902")) %>%
# build the plot
ggplot(aes(x = station, y = Abundance,
fill = station, color = station)) +
facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) + # outliers not plotted here in boxplot
geom_jitter() +
theme_bw() +
labs(title = "Cyanobacteriota Genus Abundance") +
scale_color_manual(values = station_colors) +
scale_fill_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "none")
# Plot genus: Continuous
cyano_genus_salinity <-
genus_df %>%
dplyr::filter(Phylum == "Cyanobacteriota") %>%
dplyr::filter(Genus %in% c("Cyanobium PCC-6307", "Synechococcus CC9902")) %>%
# build the plot
ggplot(aes(x = salinity_psu, y = Abundance)) +
facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
geom_point(aes(color = station)) + theme_bw() +
geom_smooth(method = "lm", formula = y ~ poly(x, 2)) +
labs(title = "Cyanobacteriota Genus Abundance") +
scale_color_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "none")
# Collect the Cyanobacteriota Plots
cyano_genus_station / cyano_genus_salinity +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")Genus-Level Response of Cyanobacteriota to Salinity
The above figure displays how two cyanobacterial genera—Cyanobium PCC-6307 and Synechococcus CC9902—respond to a continuous salinity gradient across estuarine environments. We can conclude that:
Together, these patterns illustrate niche partitioning within Cyanobacteriota along environmental gradients and emphasize how salinity acts as a strong ecological filter shaping microbial community composition at fine taxonomic resolution.
INTERPRETATION #5: Please make some plots of mid-level taxonomic ranks (e.g. Genus, Family, Order) as it relates to your scientific question. Were there any changes in the mid-level taxonomic trends compared to the phylum-level results? What does that tell you about that taxonomic group?
Now, let’s take a look at the ASVs. This is where the real ecology/biology happens! This is because the ASV-level plots will provide us with a more detailed view of which specific taxa (ASVs) are driving the overall trends seen at the higher taxonomic levels (e.g. phylum) in relation to salinity gradients and station differences.
Before we calculate the ASV-level abundances, let’s take a second to think about who we’d like to consider. There’s a lot of ASVs! In fact, there are 2535! It is difficult analyze all of them! Therefore, we will remove ASVs that have an overall count across the entire dataset of 196 or 1% of a scaled sample to 1962. This is a little arbitrary and therefore, you may choose a different threshold for your dataset. The goal here is to dramatically decreases the number of ASVs in the dataset to help make the ASV-level data analysis much easier.
Of course, if we had more time in class, we could run differential abundance and this would also help us to identify and statistically test which ASVs are important and also help us narrow on specific ASVs.
The goal with this lesson is to do some data exploration and then also test ASVs that are relevant for our specific study. Let’s go!
# Calculate the Family relative abundance
# Note: The read depth MUST be normalized in some way: scale_reads
ASV_df <-
scaled_physeq %>%
# Prune out ASVs that have fewer than 100 counts!
## LOOK AT HOW MANY ARE REMOVED! We scaled to 1,962 reads!
prune_taxa(taxa_sums(.) >= 196, .) %>%
# agglomerate at the phylum level
tax_glom(taxrank = "ASV") %>%
# Transform counts to relative abundance
transform_sample_counts(function (x) {x/sum(x)}) %>%
# Melt to a long format
psmelt() %>%
# fix the order of date
mutate(date = fct_relevel(date, c("6/2/21", "6/15/21", "10/5/21")),
station = fct_relevel(station, c("Copano West", "Copano East",
"Mesquite Bay", "Aransas Bay",
"Shipping Channel")))# Calculate top couple of ASVs
# Make a list of phyla the top phyla
top_actino_ASVs <-
ASV_df %>%
dplyr::filter(Phylum == "Actinomycetota") %>%
group_by(ASV) %>%
summarize(mean_Abundance = mean(Abundance)) %>%
dplyr::filter(mean_Abundance > 0.005) %>%
pull(ASV)
# Actinobacteriota
# Plot ASVs
actino_asv_station <-
ASV_df %>%
dplyr::filter(ASV %in% top_actino_ASVs) %>%
# build the plot
ggplot(aes(x = station, y = Abundance,
fill = station, color = station)) +
facet_wrap(Genus~ASV, scales = "free_y", nrow = 2) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) + # outliers not plotted here in boxplot
geom_jitter() +
theme_bw() +
labs(title = "Actinomycetota ASVs") +
scale_color_manual(values = station_colors) +
scale_fill_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "none")
# Plot ASVs: Continuous
actino_asv_salinity <-
ASV_df %>%
dplyr::filter(ASV %in% top_actino_ASVs) %>%
# build the plot
ggplot(aes(x = salinity_psu, y = Abundance)) +
facet_wrap(Genus~ASV, scales = "free_y", nrow = 2) +
geom_point(aes(color = station)) + theme_bw() +
geom_smooth(method = "lm", formula = y ~ poly(x, 2)) +
labs(title = "Actinomycetota ASVs") +
scale_color_manual(values = station_colors) +
theme(legend.position = "none")
# Collect the Actinomycetota Plots
actino_asv_station / actino_asv_salinity +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")The Actinomycetota phylum-level pattern of an abundance decline with salinity is not driven by a single taxon, but rather reflects a consistent pattern across multiple ASVs, most of which show monotonic or sharply declining abundance across increasing salinity. This points to a shared sensitivity to salinity within this group, likely shaping their distribution in estuarine gradients.
A. Relationship to Station: - All dominant ASVs are most abundant at low-salinity sites (Copano West and East). - Most ASVs decline dramatically across the salinity gradient, especially in Aransas Bay and the Shipping Channel. B. Relationship to Salinity: - ASV_0006 and ASV_0048 (hgcl clade) show a strong negative relationship with salinity—nearly disappearing at salinities >20 PSU. - ASV_0017 (Candidatus Aquiluna) shows a slight unimodal response, peaking at mid-range salinities but still declining overall. - ASV_0028 (ML602J-51) also follows a clear negative trend, matching the phylum-level pattern.
Some interpretations
Let’s narrow in on more abundant ASVs by removing any ASVs that are less than 0.5% abundance. Otherwise there were many ASVs in this group!
# Calculate top couple of ASVs
# Make a list of phyla the top phyla
top_cyano_ASVs <-
ASV_df %>%
dplyr::filter(Phylum == "Cyanobacteriota") %>%
group_by(ASV) %>%
summarize(mean_Abundance = mean(Abundance)) %>%
dplyr::filter(mean_Abundance > 0.005) %>%
pull(ASV)
# Plot ASVs: Station
cyano_asv_station <-
ASV_df %>%
# Subset for more abundant ASVs
dplyr::filter(ASV %in% top_cyano_ASVs) %>%
# build the plot
ggplot(aes(x = station, y = Abundance,
fill = station, color = station)) +
facet_wrap(Genus~ASV, scales = "free_y", nrow = 2) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) + # outliers not plotted here in boxplot
geom_jitter() +
theme_bw() +
labs(title = "Cyanobacteriota ASVs > 0.05%") +
scale_color_manual(values = station_colors) +
scale_fill_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "none")
# Plot ASVs: Continuous
cyano_asv_salinity <-
ASV_df %>%
dplyr::filter(ASV %in% top_cyano_ASVs) %>%
# build the plot
ggplot(aes(x = salinity_psu, y = Abundance)) +
facet_wrap(Genus~ASV, scales = "free_y", nrow = 2) +
geom_point(aes(color = station)) + theme_bw() +
geom_smooth(method = "lm", formula = y ~ poly(x, 2)) +
labs(title = "Cyanobacteriota ASVs > 0.05%") +
scale_color_manual(values = station_colors) +
theme(legend.position = "none")
# Collect the Actinomycetota Plots
cyano_asv_station / cyano_asv_salinity +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")Ok, this is interesting! While the phylum-level plot shows a clear unimodal relationship with salinity, the ASV-level analysis reveals taxon-specific responses, with Cyanobium peaking at moderate salinity and Synechococcus increasing with high salinity. This suggests differential adaptation strategies and niche differentiation within Cyanobacteriota in response to salinity gradients.
A. Relationship to Station: This spatial pattern reinforces the environmental filtering hypothesis, where salinity selects for different cyanobacterial taxa. - Shipping Channel, the site with the highest salinity, a spike in Synechococcus ASVs and a concomitant decline or absence of many Cyanobium ASVs - Copano West/East, with lower salinities, are dominated by Cyanobium ASVs B. Relationship to Salinity: Salinity shapes niche partitioning among Cyanobacteria! Overall, Cyanobium ASVs tend to dominate lower to intermediate salinity zones and Synechococcus ASVs tend to increase in higher salinity environments, such as the Shipping Channel. Specifically: - Several ASVs (e.g., ASV_0001, ASV_0002, both Cyanobium) follow a unimodal trend, peaking at intermediate salinity (~15–20 PSU)—mirroring the phylum-level curve. - Synechococcus ASVs (e.g., ASV_0005, ASV_0025) show increasing abundance with higher salinity, suggesting they are more halotolerant or marine-adapted. - Some Cyanobium ASVs (e.g., ASV_0012, ASV_0043) show declines at high salinity.
INTERPRETATION #6: Were the most abundant ASVs in your dataset represented in your ASV-level plots? If not, please go back and plot them specifically. (Hint: Remember that we ordered our ASV names by abundance across the entire dataset. Therefore, ASV_0001 will be the most abundant, ASV_0007 will be the 7th most abundant and so on.)
INTERPRETATION #7: Please make some plots at the ASV level as it relates to your scientific question. Were there any ASVs that were especially related to trends that you saw? Did a specific phylogenetic group all show the same trends (e.g. like the Actinomycetota ASVs above)? Or did the ASVs have various, likely niche-specific responses (similar to the Cyanobacteriota above)??
INTERPRETATION #8: Now, do a quick literature or google search and look back in the paper that you may be replicating (if applicable). Does your ASV-level changes represent something that is currently known? Or is this a novel result? What did you learn about this ASV or a group of ASVs in your dataset?
# Calculate top couple of ASVs
# Make a list of phyla the top phyla
top_verruco_ASVs <-
ASV_df %>%
dplyr::filter(Phylum == "Verrucomicrobiota") %>%
group_by(ASV) %>%
summarize(mean_Abundance = mean(Abundance)) %>%
dplyr::filter(mean_Abundance > 0.005) %>%
pull(ASV)
# Plot ASVs: Station
verruco_asv_station <-
ASV_df %>%
# Subset for more abundant ASVs
dplyr::filter(ASV %in% top_verruco_ASVs) %>%
# build the plot
ggplot(aes(x = station, y = Abundance,
fill = station, color = station)) +
facet_wrap(Genus~ASV, scales = "free_y", nrow = 1) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) + # outliers not plotted here in boxplot
geom_jitter() +
theme_bw() +
labs(title = "Verrucomicrobiota ASVs > 0.05%") +
scale_color_manual(values = station_colors) +
scale_fill_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "none")
# Plot ASVs: Continuous
verruco_asv_salinity <-
ASV_df %>%
dplyr::filter(ASV %in% top_verruco_ASVs) %>%
# build the plot
ggplot(aes(x = salinity_psu, y = Abundance)) +
facet_wrap(Genus~ASV, scales = "free_y", nrow = 1) +
geom_point(aes(color = station)) + theme_bw() +
geom_smooth(method = "lm", formula = y ~ poly(x, 2)) +
labs(title = "Verrucomicrobiota ASVs > 0.05%") +
scale_color_manual(values = station_colors) +
theme(legend.position = "none")
# Collect the Actinomycetota Plots
verruco_asv_station / verruco_asv_salinity +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")# Calculate top couple of ASVs
# Make a list of phyla the top phyla
top_bacter_ASVs <-
ASV_df %>%
dplyr::filter(Phylum == "Bacteroidota") %>%
group_by(ASV) %>%
summarize(mean_Abundance = mean(Abundance)) %>%
dplyr::filter(mean_Abundance > 0.005) %>%
pull(ASV)
# Plot ASVs: Station
bacter_asv_station <-
ASV_df %>%
# Subset for more abundant ASVs
dplyr::filter(ASV %in% top_bacter_ASVs) %>%
# build the plot
ggplot(aes(x = station, y = Abundance,
fill = station, color = station)) +
facet_wrap(Genus~ASV, scales = "free_y", nrow = 2) +
geom_boxplot(alpha = 0.5, outlier.shape = NA) + # outliers not plotted here in boxplot
geom_jitter() +
theme_bw() +
labs(title = "Bacteroidota ASVs > 0.05%") +
scale_color_manual(values = station_colors) +
scale_fill_manual(values = station_colors) +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1),
legend.position = "none")
# Plot ASVs: Continuous
bacter_asv_salinity <-
ASV_df %>%
dplyr::filter(ASV %in% top_bacter_ASVs) %>%
# build the plot
ggplot(aes(x = salinity_psu, y = Abundance)) +
facet_wrap(Genus~ASV, scales = "free_y", nrow = 2) +
geom_point(aes(color = station)) + theme_bw() +
geom_smooth(method = "lm", formula = y ~ poly(x, 2)) +
labs(title = "Bacteroidota ASVs > 0.05%") +
scale_color_manual(values = station_colors) +
theme(legend.position = "none")
# Collect the Actinomycetota Plots
bacter_asv_station / bacter_asv_salinity +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")For reproducibility
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.2.3 (2023-03-15)
## os Rocky Linux 9.4 (Blue Onyx)
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/New_York
## date 2025-04-23
## pandoc 3.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## ade4 1.7-23 2025-02-14 [1] CRAN (R 4.2.3)
## ape 5.8-1 2024-12-16 [1] CRAN (R 4.2.3)
## Biobase 2.58.0 2022-11-01 [1] Bioconductor
## BiocGenerics 0.44.0 2022-11-01 [1] Bioconductor
## biomformat 1.26.0 2022-11-01 [1] Bioconductor
## Biostrings 2.66.0 2022-11-01 [1] Bioconductor
## bitops 1.0-9 2024-10-03 [2] CRAN (R 4.2.3)
## bslib 0.9.0 2025-01-30 [1] CRAN (R 4.2.3)
## cachem 1.1.0 2024-05-16 [1] CRAN (R 4.2.3)
## cli 3.6.4 2025-02-13 [1] CRAN (R 4.2.3)
## cluster 2.1.4 2022-08-22 [2] CRAN (R 4.2.3)
## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.2.3)
## colorspace 2.1-1 2024-07-26 [1] CRAN (R 4.2.3)
## crayon 1.5.3 2024-06-20 [1] CRAN (R 4.2.3)
## crosstalk 1.2.1 2023-11-23 [1] CRAN (R 4.2.1)
## data.table 1.17.0 2025-02-22 [1] CRAN (R 4.2.3)
## devtools * 2.4.5 2022-10-11 [1] CRAN (R 4.2.3)
## digest 0.6.37 2024-08-19 [1] CRAN (R 4.2.3)
## dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.2.1)
## DT * 0.33 2024-04-04 [1] CRAN (R 4.2.3)
## ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.2.3)
## evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.2.3)
## farver 2.1.2 2024-05-13 [2] CRAN (R 4.2.3)
## fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.2.3)
## forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.2.1)
## foreach 1.5.2 2022-02-02 [1] CRAN (R 4.2.1)
## fs 1.6.5 2024-10-30 [1] CRAN (R 4.2.3)
## generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1)
## GenomeInfoDb 1.34.9 2023-02-02 [1] Bioconductor
## GenomeInfoDbData 1.2.9 2023-03-07 [1] Bioconductor
## ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.2.1)
## glue 1.8.0 2024-09-30 [1] CRAN (R 4.2.3)
## gtable 0.3.6 2024-10-25 [1] CRAN (R 4.2.3)
## hms 1.1.3 2023-03-21 [1] CRAN (R 4.2.1)
## htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.2.3)
## htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.2.1)
## httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.2.3)
## igraph 2.1.4 2025-01-23 [1] CRAN (R 4.2.3)
## IRanges 2.32.0 2022-11-01 [1] Bioconductor
## iterators 1.0.14 2022-02-05 [1] CRAN (R 4.2.1)
## jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.2.3)
## jsonlite 1.9.1 2025-03-03 [1] CRAN (R 4.2.3)
## knitr 1.50 2025-03-16 [1] CRAN (R 4.2.3)
## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.2.1)
## later 1.4.1 2024-11-27 [1] CRAN (R 4.2.3)
## lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.3)
## lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.2.1)
## lubridate * 1.9.4 2024-12-08 [1] CRAN (R 4.2.3)
## magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.2.3)
## MASS 7.3-58.2 2023-01-23 [2] CRAN (R 4.2.3)
## Matrix 1.6-5 2024-01-11 [1] CRAN (R 4.2.1)
## memoise 2.0.1 2021-11-26 [2] CRAN (R 4.2.3)
## mgcv 1.8-42 2023-03-02 [2] CRAN (R 4.2.3)
## mime 0.13 2025-03-17 [1] CRAN (R 4.2.3)
## miniUI 0.1.1.1 2018-05-18 [2] CRAN (R 4.2.3)
## multtest 2.54.0 2022-11-01 [1] Bioconductor
## munsell 0.5.1 2024-04-01 [2] CRAN (R 4.2.3)
## nlme 3.1-162 2023-01-31 [2] CRAN (R 4.2.3)
## pacman 0.5.1 2019-03-11 [1] CRAN (R 4.2.3)
## patchwork * 1.3.0.9000 2025-02-19 [1] Github (thomasp85/patchwork@2695a9f)
## permute 0.9-7 2022-01-27 [1] CRAN (R 4.2.1)
## phyloseq * 1.42.0 2022-11-01 [1] Bioconductor
## pillar 1.10.1 2025-01-07 [1] CRAN (R 4.2.3)
## pkgbuild 1.4.5 2024-10-28 [2] CRAN (R 4.2.3)
## pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.2.3)
## pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.2.3)
## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.2.1)
## profvis 0.4.0 2024-09-20 [2] CRAN (R 4.2.3)
## promises 1.3.2 2024-11-28 [1] CRAN (R 4.2.3)
## purrr * 1.0.4 2025-02-05 [1] CRAN (R 4.2.3)
## R6 2.6.1 2025-02-15 [1] CRAN (R 4.2.3)
## Rcpp 1.0.14 2025-01-12 [1] CRAN (R 4.2.3)
## RCurl 1.98-1.17 2025-03-22 [1] CRAN (R 4.2.3)
## readr * 2.1.5 2024-01-10 [1] CRAN (R 4.2.1)
## remotes 2.5.0 2024-03-17 [1] CRAN (R 4.2.3)
## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.2.1)
## rhdf5 2.42.1 2023-04-07 [1] Bioconductor
## rhdf5filters 1.10.1 2023-03-24 [1] Bioconductor
## Rhdf5lib 1.20.0 2022-11-01 [1] Bioconductor
## rlang 1.1.5 2025-01-17 [1] CRAN (R 4.2.3)
## rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.2.3)
## rstudioapi 0.17.1 2024-10-22 [2] CRAN (R 4.2.3)
## S4Vectors 0.36.2 2023-02-26 [1] Bioconductor
## sass 0.4.9 2024-03-15 [1] CRAN (R 4.2.3)
## scales 1.3.0 2023-11-28 [1] CRAN (R 4.2.1)
## sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.2.3)
## shiny 1.10.0 2024-12-14 [1] CRAN (R 4.2.3)
## stringi 1.8.4 2024-05-06 [1] CRAN (R 4.2.3)
## stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.2.1)
## survival 3.5-3 2023-02-12 [2] CRAN (R 4.2.3)
## tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.2.1)
## tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.2.1)
## tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.2.3)
## tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.2.3)
## timechange 0.3.0 2024-01-18 [1] CRAN (R 4.2.1)
## tzdb 0.5.0 2025-03-15 [1] CRAN (R 4.2.3)
## urlchecker 1.0.1 2021-11-30 [2] CRAN (R 4.2.3)
## usethis * 3.0.0 2024-07-29 [2] CRAN (R 4.2.3)
## vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.2.1)
## vegan 2.6-10 2025-01-29 [1] CRAN (R 4.2.3)
## withr 3.0.2 2024-10-28 [1] CRAN (R 4.2.3)
## xfun 0.51 2025-02-19 [1] CRAN (R 4.2.3)
## xtable 1.8-4 2019-04-21 [2] CRAN (R 4.2.3)
## XVector 0.38.0 2022-11-01 [1] Bioconductor
## yaml 2.3.10 2024-07-26 [1] CRAN (R 4.2.3)
## zlibbioc 1.44.0 2022-11-01 [1] Bioconductor
##
## [1] /home/mls528/R/x86_64-pc-linux-gnu-library/4.2
## [2] /programs/R-4.2.3/lib64/R/library
##
## ──────────────────────────────────────────────────────────────────────────────